25 research outputs found

    Reinforcement Learning for Automatic Test Case Prioritization and Selection in Continuous Integration

    Full text link
    Testing in Continuous Integration (CI) involves test case prioritization, selection, and execution at each cycle. Selecting the most promising test cases to detect bugs is hard if there are uncertainties on the impact of committed code changes or, if traceability links between code and tests are not available. This paper introduces Retecs, a new method for automatically learning test case selection and prioritization in CI with the goal to minimize the round-trip time between code commits and developer feedback on failed test cases. The Retecs method uses reinforcement learning to select and prioritize test cases according to their duration, previous last execution and failure history. In a constantly changing environment, where new test cases are created and obsolete test cases are deleted, the Retecs method learns to prioritize error-prone test cases higher under guidance of a reward function and by observing previous CI cycles. By applying Retecs on data extracted from three industrial case studies, we show for the first time that reinforcement learning enables fruitful automatic adaptive test case selection and prioritization in CI and regression testing.Comment: Spieker, H., Gotlieb, A., Marijan, D., & Mossige, M. (2017). Reinforcement Learning for Automatic Test Case Prioritization and Selection in Continuous Integration. In Proceedings of 26th International Symposium on Software Testing and Analysis (ISSTA'17) (pp. 12--22). AC

    Industry–Academia Research Collaboration and Knowledge Co-creation: Patterns and Anti-patterns

    Get PDF
    Increasing the impact of software engineering research in the software industry and the society at large has long been a concern of high priority for the software engineering community. The problem of two cultures, research conducted in a vacuum (disconnected from the real world), or misaligned time horizons are just some of the many complex challenges standing in the way of successful industry–academia collaborations. This article reports on the experience of research collaboration and knowledge co-creation between industry and academia in software engineering as a way to bridge the research–practice collaboration gap. Our experience spans 14 years of collaboration between researchers in software engineering and the European and Norwegian software and IT industry. Using the participant observation and interview methods, we have collected and afterwards analyzed an extensive record of qualitative data. Drawing upon the findings made and the experience gained, we provide a set of 14 patterns and 14 anti-patterns for industry–academia collaborations, aimed to support other researchers and practitioners in establishing and running research collaboration projects in software engineering.publishedVersio

    Measuring the Effect of Causal Disentanglement on the Adversarial Robustness of Neural Network Models

    Full text link
    Causal Neural Network models have shown high levels of robustness to adversarial attacks as well as an increased capacity for generalisation tasks such as few-shot learning and rare-context classification compared to traditional Neural Networks. This robustness is argued to stem from the disentanglement of causal and confounder input signals. However, no quantitative study has yet measured the level of disentanglement achieved by these types of causal models or assessed how this relates to their adversarial robustness. Existing causal disentanglement metrics are not applicable to deterministic models trained on real-world datasets. We, therefore, utilise metrics of content/style disentanglement from the field of Computer Vision to measure different aspects of the causal disentanglement for four state-of-the-art causal Neural Network models. By re-implementing these models with a common ResNet18 architecture we are able to fairly measure their adversarial robustness on three standard image classification benchmarking datasets under seven common white-box attacks. We find a strong association (r=0.820, p=0.001) between the degree to which models decorrelate causal and confounder signals and their adversarial robustness. Additionally, we find a moderate negative association between the pixel-level information content of the confounder signal and adversarial robustness (r=-0.597, p=0.040).Comment: 12 pages, 3 figure

    Optimal Minimisation of Pairwise-covering Test Configurations Using Constraint Programming

    Get PDF
    International audienceContext: Testing highly-configurable software systems is challenging due to a large number of test configurations that have to be carefully selected in order to reduce the testing effort as much as possible, while maintaining high software quality. Finding the smallest set of valid test configurations that ensure sufficient coverage of the system's feature interactions is thus the objective of validation engineers, especially when the execution of test configurations is costly or time-consuming. However, this problem is NP-hard in general and approximation algorithms have often been used to address it in practice. Objective: In this paper, we explore an alternative approach based on constraint programming that will allow engineers to increase the effectiveness of configuration testing while keeping the number of configurations as low as possible. Method: Our approach consists in using a (time-aware) minimisation algorithm based on constraint programming. Given the amount of time, our solution generates a minimised set of valid test configurations that ensure coverage of all pairs of feature values (a.k.a. pairwise coverage). The approach has been implemented in a tool called PACOGEN. Results: PACOGEN was evaluated on 224 feature models in comparison with the two existing tools that are based on a greedy algorithm. For 79% of 224 feature models, PACOGEN generated up to 60% fewer test configurations than the competitor tools. We further evaluated PACOGEN in the case study of large industrial highly-configurable video conferencing software with a feature model of 169 features, and found 60% fewer configurations compared with the manual approach followed by test engineers. The set of test configurations generated by PACOGEN decreased the time required by test engineers in manual test configuration by 85%, increasing the feature-pairs coverage at the same time. Conclusion: Extensive evaluation concluded that optimal minimisation of pairwise-covering test configurations is efficiently addressed using constraint programming techniques

    Practical Pairwise Testing for Software Product Lines

    Get PDF
    International audienceOne key challenge for software product lines is efficiently managing variability throughout their lifecycle. In this paper, we address the problem of variability in software product lines testing. We (1) identify a set of issues that must be addressed to make software product line testing work in practice and (2) provide a framework that combines a set of techniques to solve these issues. The framework integrates feature modelling, combinatorial interaction testing and constraint programming techniques. First, we extract variability in a software product line as a feature model with specified feature interdependencies. We then employ an algorithm that generates a minimal set of valid test cases covering all 2-way feature interactions for a given time interval. Furthermore, we evaluate the framework on an industrial SPL and show that using the framework saves time and provides better test coverage. In particular, our experiments show that the framework improves industrial testing practice in terms of (i) 17% smaller set of test cases that are (a) valid and (b) guarantee all 2-way feature coverage (as opposite to 19.2% 2-way feature coverage in the hand made test set), and (ii) full flexibility and adjustment of test generation to available testing time

    Detecting Intentional AIS Shutdown in Open Sea Maritime Surveillance Using Self-Supervised Deep Learning

    Full text link
    In maritime traffic surveillance, detecting illegal activities, such as illegal fishing or transshipment of illicit products is a crucial task of the coastal administration. In the open sea, one has to rely on Automatic Identification System (AIS) message transmitted by on-board transponders, which are captured by surveillance satellites. However, insincere vessels often intentionally shut down their AIS transponders to hide illegal activities. In the open sea, it is very challenging to differentiate intentional AIS shutdowns from missing reception due to protocol limitations, bad weather conditions or restricting satellite positions. This paper presents a novel approach for the detection of abnormal AIS missing reception based on self-supervised deep learning techniques and transformer models. Using historical data, the trained model predicts if a message should be received in the upcoming minute or not. Afterwards, the model reports on detected anomalies by comparing the prediction with what actually happens. Our method can process AIS messages in real-time, in particular, more than 500 Millions AIS messages per month, corresponding to the trajectories of more than 60 000 ships. The method is evaluated on 1-year of real-world data coming from four Norwegian surveillance satellites. Using related research results, we validated our method by rediscovering already detected intentional AIS shutdowns.Comment: IEEE Transactions on Intelligent Transportation System

    Opening the Software Engineering Toolbox for the Assessment of Trustworthy AI

    Get PDF
    Trustworthiness is a central requirement for the acceptance and success of human-centered artificial intelligence (AI). To deem an AI system as trustworthy, it is crucial to assess its behaviour and characteristics against a gold standard of Trustworthy AI, consisting of guidelines, requirements, or only expectations. While AI systems are highly complex, their implementations are still based on software. The software engineering community has a long established toolbox for the assessment of software systems, especially in the context of software testing. In this paper, we argue for the application of software engineering and testing practices for the assessment of trustworthy AI. We make the connection between the seven key requirements as defined by the European Commission’s AI high-level expert group and established procedures from software engineering and raise questions for future work.publishedVersio

    Adversarial Deep Reinforcement Learning for Improving the Robustness of Multi-agent Autonomous Driving Policies

    Full text link
    Autonomous cars are well known for being vulnerable to adversarial attacks that can compromise the safety of the car and pose danger to other road users. To effectively defend against adversaries, it is required to not only test autonomous cars for finding driving errors, but to improve the robustness of the cars to these errors. To this end, in this paper, we propose a two-step methodology for autonomous cars that consists of (i) finding failure states in autonomous cars by training the adversarial driving agent, and (ii) improving the robustness of autonomous cars by retraining them with effective adversarial inputs. Our methodology supports testing ACs in a multi-agent environment, where we train and compare adversarial car policy on two custom reward functions to test the driving control decision of autonomous cars. We run experiments in a vision-based high fidelity urban driving simulated environment. Our results show that adversarial testing can be used for finding erroneous autonomous driving behavior, followed by adversarial training for improving the robustness of deep reinforcement learning based autonomous driving policies. We demonstrate that the autonomous cars retrained using the effective adversarial inputs noticeably increase the performance of their driving policies in terms of reduced collision and offroad steering errors

    Software Testing for Machine Learning

    No full text
    Machine learning has become prevalent across a wide variety of applications. Unfortunately, machine learning has also shown to be susceptible to deception, leading to errors, and even fatal failures. This circumstance calls into question the widespread use of machine learning, especially in safety-critical applications, unless we are able to assure its correctness and trustworthiness properties. Software verification and testing are established technique for assuring such properties, for example by detecting errors. However, software testing challenges for machine learning are vast and profuse - yet critical to address. This summary talk discusses the current state-of-the-art of software testing for machine learning. More specifically, it discusses six key challenge areas for software testing of machine learning systems, examines current approaches to these challenges and highlights their limitations. The paper provides a research agenda with elaborated directions for making progress toward advancing the state-of-the-art on testing of machine learning
    corecore